Item Sets that Compress

نویسندگان

  • Arno Siebes
  • Jilles Vreeken
  • Matthijs van Leeuwen
چکیده

One of the major problems in frequent item set mining is the explosion of the number of results: it is difficult to find the most interesting frequent item sets. The cause of this explosion is that large sets of frequent item sets describe essentially the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of frequent item sets is that set that compresses the database best. We introduce four heuristic algorithms for this task, and the experiments show that these algorithms give a dramatic reduction in the number of frequent item sets. Moreover, we show how our approach can be used to determine the best value for the min-sup threshold.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compression Picks The Significant Item Sets

Finding a comprehensive set of patterns that truly captures the characteristics of a database is a complicated matter. Frequent item set mining attempts this, but low support levels often result in exorbitant amounts of item sets. Recently we showed that by using MDL we are able to select a small number of item sets that compress the data well [15]. Here we show that this small set is a good ap...

متن کامل

Similarity Data Item Set Approach: An Encoded Temporal Data Base Technique

Data mining has been widely recognized as a powerful tool to explore added value from large-scale databases. Finding frequent item sets in databases is a crucial in data mining process of extracting association rules. Many algorithms were developed to find the frequent item sets. This paper presents a summary and a comparative study of the available FP-growth algorithm variations produced for m...

متن کامل

CT-ITL : Efficient Frequent Item Set Mining Using a Compressed Prefix Tree with Pattern Growth

Discovering association rules that identify relationships among sets of items is an important problem in data mining. Finding frequent item sets is computationally the most expensive step in association rule discovery and therefore it has attracted significant research attention. In this paper, we present a more efficient algorithm for mining complete sets of frequent item sets. In designing ou...

متن کامل

Frequent Patterns that Compress

One of the major problems in frequent pattern mining is the explosion of the number of results, making it difficult to identify the interesting frequent patterns. In a recent paper [14] we have shown that an MDL-based approach gives a dramatic reduction of the number of frequent item sets to consider. Here we show that MDL gives similarly good reductions for frequent patterns on other types of ...

متن کامل

The Distinction of Hot Herbal Compress, Hot Compress, and Topical Diclofenac as Myofascial Pain Syndrome Treatment

This randomized controlled trial aimed to investigate the distinctness after treatment among hot herbal compress, hot compress, and topical diclofenac. The registrants were equally divided into groups and received the different treatments including hot herbal compress, hot compress, and topical diclofenac group, which served as the control group. After treatment courses, Visual Analog Scale and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006